Skip to content

Conversation

@snowmead
Copy link
Contributor

@snowmead snowmead commented Dec 4, 2025

Summary

Adds a versioned migration framework for managing RocksDB column family schema changes. This enables safe removal of deprecated column families when upgrading existing databases, ensuring backward compatibility with older node installations.

Design

The migration system handles a key RocksDB constraint: you cannot open a database without specifying all existing column families. This means deprecated CFs must be discovered and included when opening, then dropped via migration.

Migration Flow:

  1. DB::list_cf() discovers all existing column families (including deprecated ones)
  2. Database opens with union of existing + current CFs
  3. MigrationRunner checks schema version and runs pending migrations
  4. Deprecated CFs are dropped and schema version is updated

Schema version is tracked in a dedicated __schema_version__ column family. Databases without this CF (pre-migration) are treated as version 0.

Notable changes

Migration Framework (client/common/src/migrations/)

  • Migration trait defines versioned migrations with deprecated CF lists
  • MigrationDescriptor captures migration metadata for runtime use
  • MigrationRunner handles version tracking and migration execution
  • MigrationError enum for structured error handling
  • Comprehensive test suite covering fresh databases, existing databases with deprecated CFs, idempotency, and data preservation

V1 Migration

Drops deprecated MSP respond storage request column families:

  • pending_msp_respond_storage_request
  • pending_msp_respond_storage_request_left_index
  • pending_msp_respond_storage_request_right_index

These were replaced with in-memory queueing in MspHandler.

Unified Database Opening

  • TypedRocksDB::open() now handles migrations automatically
  • Removed manual DB::open_cf_descriptors() calls from all stores
  • Consistent CURRENT_COLUMN_FAMILIES pattern across stores

Stores updated:

  • BlockchainServiceStateStore
  • DownloadStateStore
  • BspPeerManagerStore

Adding New Migrations

// 1. Create client/common/src/migrations/v2.rs
pub struct V2Migration;

impl Migration for V2Migration {
    const VERSION: u32 = 2;

    fn deprecated_column_families() -> &'static [&'static str] {
        &["old_cf_to_remove"]
    }

    fn description() -> &'static str {
        "Remove old_cf_to_remove column family"
    }
}

// 2. Register in MigrationRunner::all_migrations()
vec![
    MigrationDescriptor::new::<v1::V1Migration>(),
    MigrationDescriptor::new::<v2::V2Migration>(),
]

snowmead and others added 30 commits November 18, 2025 15:19
- Refactored the storage request submission process in `move-bucket.test.ts` to utilize a batch processing helper
…Postgres DB (#563)

* fix: 🩹 Remove and update old comments

* feat: 🚧 Add `blockchain-service-db` crate and postgres schema

* feat: 🚧 Wire pending tx DB updating to `send_extrinsic`

* feat: 🚧 Add CLI param for db URL, initialise DB on BS startup and update status with watcher

* fix: 🐛 Use i64 for nonce

* feat: ✨ Add pending tx postgres to integration test suite

* fix: 🐛 Wire CLI pending db param to blockchain service initialisation

* test: ✅ Fix passing CLI pending db param in test suites

* feat: ✨ Clear pending txs from DB in finality

* docs: 📝 Document functions in `store.rs` for pending DB

* feat: ✨ Log when a pending tx has a state update but for a different tx hash

* fix: 🐛 Initialise Blockchain Service last processed blocks with genesis

* test: ✅ Fix tests using old indexer db container name

* test: ✅ Add back backend container initialisation

* fix: 🐛 Remove duplicate indexer nodes

* test: ✅ Fix name change mistakenly

* test: ✨ Add new pending DB testing utilities

* fix: 🗑️ Remove deprecated `createApiObject`

* test: ✅ Add persistent pending Tx DB integration tests

* feat: 🚧 Add `load_account_with_states` query to enable re-subscription at startup

* feat: ✨ Re-watch transactions pending after restarting MSP

* test: ✅ Add test for not re-watching extrinsic with nonce below on-chain nonce

* feat: ✨ Add `watched` boolean field to pending db

* feat: ✨ Persist gap filling remark transactions

* fix: ✅ Fix race condition where container wasn't fully paused

* feat: ✨ Add pendingDbUrl option to `addBspContainer` as well

* refactor: 🚚 Rename `insert_sent` to `upsert_sent`

* feat: 🔥 Remove unused `load_active` function from pending db interface

* refactor: ♻️ Use `Vec<String>` directly in `load_resubscribe_rows` params

* feat: 🩹 Remove usage of `sent` state in pending DB

* refactor: ♻️ Use `TransactionStatus` in db interface functions

* fix: 🐛 Track transaciton in `transaction_manager` even with no `call_scale`

* refactor: ♻️ Use constants for container names of DBs and backend

* test: ✅ Add check after sealing block of pending tx not updating

* feat: ✨ Add message in remark fake tx

* fix: ✅ Use new constant instead of hardcoded postgres container name

* fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup

* fix: 🐛 Set all txs pending to `watched=false` then only those we re-watch back to `true`

* Revert "fix: 🐛 Resubscribe to pending txs in initial sync handling istead of startup"

This reverts commit df6af95.

* fix: 🐛 Try to watch in_block pending transacitons too

* fix: 🐛 Not filter by on-chain nonce when re-subscribing to pending txs

* test: ✅ Improve test error logging

* feat: ✨ Add custom error to submit_and_watch_extrinsic

* fix: 🩹 Log and skip when error in re-subscribing is old nonce

* fix: ✅ Consider node race condition in test
- Consolidate capacity management with single increase per batch
- Add batch trimming to fit within capacity limits
- Implement batch rejection with single extrinsic for efficiency
- Extract helper methods for file metadata construction
- Improve logging for batch processing visibility
- Clean up imports and remove unused file_key_cleanup field
- retry block sealing for checking msp acceptance
Add `msp: Option<(ProviderIdFor<T>, bool)>` field to the NewStorageRequest
event to propagate MSP assignment information through events. This allows
MSP clients to determine if a storage request was created for them and
whether they have already accepted it, without needing to query storage
request metadata from the chain. Prevents the MSP from reaccepting storage requests
The MSP could queue the same file key multiple times for acceptance,
causing MspAlreadyConfirmed errors when batches processed duplicate
entries. This occurred when multiple code paths (BatchProcessStorageRequests
and RemoteUploadRequest handlers) both called on_file_complete for the
same file.

Add a persistent HashSet (CFHashSetAPI) alongside the existing deque to
track pending file keys. Before queueing, check if the file key exists
in the set - skip if present, insert and queue if not. When popping from
the deque for batch processing, remove the file key from the set.
…status tracking

Replace batch-capacity-centric approach with an event-driven per-file processing model:

- Add FileKeyStatus enum (Processing, Accepted, Rejected, Failed, Abandoned) to track
  file key processing state across concurrent event handlers using Arc<RwLock<HashMap>>

- BatchProcessStorageRequests now emits NewStorageRequest events for each pending request
  via new PreprocessStorageRequest command, skipping already-processed/accepted/rejected
  keys and automatically retrying Failed ones

- NewStorageRequest handler performs per-file capacity management, storage creation, and
  P2P upload registration. If file already complete, immediately queues accept response

- ProcessMspRespondStoringRequest uses type-safe pallet_proofs_dealer::Error decoding to
  distinguish proof errors (mark Failed for retry) from non-proof errors (mark Abandoned)

- Move pending_respond_storage_requests queue from RocksDB to in-memory MspHandler struct,
  removing 4 column families (14 -> 10) since this state doesn't need persistence

- Remove batch_reject_storage_requests, ensure_batch_capacity, and trim_batch_to_fit_capacity
  methods as capacity is now managed per-file in NewStorageRequest handler
snowmead and others added 24 commits December 1, 2025 09:30
This file contains local MCP server configuration and should not be tracked in the repository.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…up after fisherman resume

  The `waitForIndexing` helper was using `system.number()` which returns the
  best block (e.g., #28), but the indexer only indexes finalized blocks (e.g.,
  #22). This caused a 30-second timeout waiting for "Indexing block #28" log
  that wouldn't appear until finalization caught up.
* refactor: 🚚 Move release docs to `resources`

* refactor: 🚚 Move config files to dedicated directory

* fix: 📝 Minor fix in release process doc

* refactor: 🚚 Move `backend_config` as well

* fix: 🩹 Remove extra lines in config file
* fix: 🚑 refactor upload code to avoid write lock deadlocks

* fix: 🔊 update logs to print file key and fingerprint as hexadecimal strings

* fix: 🐛 correctly error out when a file inconsistency gets detected

* revert: ⏪ revert changes to file storage
* fix: 🐛 avoid concurrent uploads for the same file key in the backend

* fix: ⏪ undo change in file diesel table
…d-storage-requests

# Conflicts:
#	client/src/tasks/msp_upload_file.rs
Resolved conflict in client/src/tasks/msp_upload_file.rs:
- Kept HEAD's handle_rejected_storage_request logic for proper on-chain rejection
- Updated return type to anyhow::Result<String> per EventHandler trait change from main
- Updated BatchProcessStorageRequests handler to return Result<String>
Add versioned migration framework for managing RocksDB schema changes:
- `MigrationRunner` discovers existing CFs and runs pending migrations
- `TypedRocksDB::open()` now handles migrations automatically
- V1 migration drops deprecated MSP respond storage request CFs

All RocksDB stores (BlockchainService, DownloadState, BspPeerManager)
now use the unified migration system.
@snowmead snowmead added B5-clientnoteworthy Changes should be mentioned client-related release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes labels Dec 4, 2025
@snowmead snowmead requested a review from ffarall December 4, 2025 18:57
@snowmead snowmead requested a review from TDemeco December 5, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B5-clientnoteworthy Changes should be mentioned client-related release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants